library(tidyverse)Neighborhood demographics and distance from a library
A look into the distribution of libraries in Minneapolis
Cleaning Population Data
Race- US Decennial Census
At the block level. Race is quantified by “percent not white” - - in papers this is later called POC
##### Initial prep #####
#skips row with long, easy to understand question
race_df = read.csv("DECENNIALPL2020.P1_2022-11-29T164332\\DECENNIALPL2020.P1-Data.csv",header=T)[-1,]
# remove annotation columns
race_df <- race_df %>%
select(-contains("NA"))
# change from character to numeric
race_df[ , 2:72] <- apply(race_df[ , 2:72], 2,
function(x) as.numeric(as.character(x)))
######## combining columns based on race ########
race_df <- race_df %>%
mutate("nonWhite" = rowSums(race_df[,5:10], na.rm = TRUE)) %>%
rename("total" = P1_001N,
"totalOneRace" = P1_002N,
"white" = P1_003N,
"black" = P1_004N,
"nativeA" = P1_005N,
"asian" = P1_006N,
"pacificI" = P1_007N,
"other" = P1_008N,
"twoOrMore" = P1_009N
) %>%
mutate("percentNotWhite" =
ifelse(is.na(nonWhite/total),0,nonWhite/total))%>%
#select(GEO_ID,total,nonWhite,white,black,nativeA,asian,pacificI,other,twoOrMore,percentNotWhite)%>%
select(GEO_ID, percentNotWhite)%>%
arrange(GEO_ID) #idk this might help arcGISwrite.csv(race_df,"raceNonWhite3.csv")Loading data into ArcGIS
Made sure to document this for my own sake
Map setup
Add shapefile – Make sure it is the corresponding type (census block, block group, tract)
Add csv (You need to add it to your database first. Catalog-> databases-> project name-> right click, import)
Create calculated field for GEO_ID
Open attribute table
Create calculated field
Use this equation:
!GEO_ID![9:]– indexes column GEO_ID from the 9th characterIt is tempting to edit the GEO_ID field in R. I tried and wasted a lot of time because when the CSV is uploaded into ArcGIS, the column was read in as a numeric and it needs to be text to join to the shapefile. I wish there was an easy work around for this.
Join the csv to the shapefile on the new calculated field – I would click the Validate Join button before running the join. If the join is unsuccessful, the validate function will tell you why. It also takes a long time on this data set.
Add city boundaries shapefile. – I found Minneapolis in ArcGIS online
Clip map to city features
- Analysis-> tools-> clip
- Input features: .shp with data
- Clip features: city boundaries You should now have a new map object with only the blocks within the city
Add shapefile with library location data
Use Near function to create column with distance
- Analysis-> tools-> near
- Input features: clipped shp
- Near features: libraries Check the attribute table to make sure the columns were added
Export data
- clipped shp, right click-> data-> export table-> click on output table and add .csv at the end
Exporting the map
- Share-> export map
- Pick the preferred file format
Map Results
The colors were chosen to show contrast between the different groups.
The distances in the Distance from a library map were chosen due to walk-ability. Anything beyond half a mile is outside a reasonably easy walk.
I chose to make two maps to help visualize the distribution of race and practicality of visiting the library based on distance. I think the distance map makes it easier to identify spots that lack accessibility.
Analysis/Linear Regression
First, I do an inner join between the raw data from the census bureau and the output file from ArcGIS to keep only the rows that are located in Minneapolis. The model for the regression analysis was run in STATA, it was initially done in this document for a quick check.
# csv from the US census
original_df = read.csv("DECENNIALPL2020.P1_2022-11-29T164332\\DECENNIALPL2020.P1-Data.csv",header=T)[-1,]
# csv output from ArcGIS
race_dist = read.csv("raceWithDistanceGEODESIC.csv")
# creating a new df so the csvs don't need to be read in everytime I run the code
newRace_df <- merge(original_df, race_dist, by.x = "GEO_ID", by.y = "raceNonWhite3_GEO_ID") %>%
rename("total" = P1_001N,
"totalOneRace" = P1_002N,
"white" = P1_003N,
"black" = P1_004N,
"nativeAmerican" = P1_005N,
"asian" = P1_006N,
"pacificIslander" = P1_007N,
"other" = P1_008N,
"twoOrMore" = P1_009N,
"distance" = NEAR_DIST
)%>%
select(GEO_ID,total,white,black,nativeAmerican,asian,pacificIslander,other,twoOrMore, distance) %>%
mutate(distance = distance *0.000621)
# change from character to numeric
newRace_df[ , 2:9] <- apply(newRace_df[ , 2:9], 2,
function(x) as.numeric(as.character(x)))
newRace_df <- newRace_df %>%
mutate("nonWhite" = rowSums(newRace_df[,4:9], na.rm = TRUE)) %>%
mutate("percentNotWhite" =
ifelse(is.na(nonWhite/total),0,nonWhite/total))
race_dist_lm = lm(distance~percentNotWhite + total, data = newRace_df)
summary(race_dist_lm)
Call:
lm(formula = distance ~ percentNotWhite + total, data = newRace_df)
Residuals:
Min 1Q Median 3Q Max
-0.78983 -0.27839 -0.01976 0.24309 1.28946
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.898e-01 7.197e-03 109.750 < 2e-16 ***
percentNotWhite -1.832e-01 1.745e-02 -10.499 < 2e-16 ***
total -1.561e-04 4.653e-05 -3.354 0.000802 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.3692 on 5961 degrees of freedom
Multiple R-squared: 0.02458, Adjusted R-squared: 0.02426
F-statistic: 75.12 on 2 and 5961 DF, p-value: < 2.2e-16
summary(newRace_df) GEO_ID total white black
Length:5964 Min. : 0.0 Min. : 0.00 Min. : 0.00
Class :character 1st Qu.: 23.0 1st Qu.: 6.00 1st Qu.: 0.00
Mode :character Median : 61.0 Median : 35.00 Median : 2.00
Mean : 73.1 Mean : 43.59 Mean : 13.88
3rd Qu.: 85.0 3rd Qu.: 57.00 3rd Qu.: 12.00
Max. :3228.0 Max. :2250.00 Max. :2982.00
nativeAmerican asian pacificIslander other
Min. : 0.000 Min. : 0.000 Min. :0.00000 Min. : 0.000
1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.:0.00000 1st Qu.: 0.000
Median : 0.000 Median : 0.000 Median :0.00000 Median : 0.000
Mean : 1.253 Mean : 4.251 Mean :0.03387 Mean : 4.246
3rd Qu.: 1.000 3rd Qu.: 4.000 3rd Qu.:0.00000 3rd Qu.: 4.000
Max. :464.000 Max. :416.000 Max. :4.00000 Max. :244.000
twoOrMore distance nonWhite percentNotWhite
Min. : 0.000 Min. :0.0000 Min. : 0.00 Min. :0.00000
1st Qu.: 1.000 1st Qu.:0.4453 1st Qu.: 3.00 1st Qu.:0.06441
Median : 4.000 Median :0.7016 Median : 12.00 Median :0.20370
Mean : 5.838 Mean :0.7243 Mean : 29.51 Mean :0.29530
3rd Qu.: 8.250 3rd Qu.:0.9670 3rd Qu.: 33.00 3rd Qu.:0.47115
Max. :122.000 Max. :1.9976 Max. :3166.00 Max. :1.00000
Graph Creation
ggplot(newRace_df, aes(x=percentNotWhite,y=distance, color = percentNotWhite)) +
geom_point(alpha = .4, show.legend = FALSE) +
geom_smooth(method = lm, color = "red", fill = "transparent") +
labs(title = "Percent POC and Distance (miles)",
x = "Percent POC",
y = "Miles") +
theme_minimal() +
scale_color_gradient(low = "yellow", high = "red") +
theme(plot.title = element_text(hjust = .5, family = "mono"),
axis.title = element_text(family = "mono"),
axis.text = element_text(family = "mono"))Summary statistics and totals
totals_df <- newRace_df %>%
select(total:twoOrMore)%>%
pivot_longer(cols = everything(), names_to = "Category", values_to = "val") %>%
group_by(Category) %>%
summarise(Total = sum(val)) %>%
arrange(Total)
totals_df# A tibble: 8 × 2
Category Total
<chr> <dbl>
1 pacificIslander 202
2 nativeAmerican 7473
3 other 25325
4 asian 25351
5 twoOrMore 34819
6 black 82809
7 white 259983
8 total 435962